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Abstract 

The purpose of this paper is to present a discussion of various statistical concepts and techniques 
in light of two propositions. First, researchers need to select analytical techniques that prevent 
them from committing Type VI errors, which is the inconsistency between the research question 
and the statistical analysis. Second, many statistical techniques are interrelated on a conceptual 
level. In addition, a list of resources is presented to assist researchers who want to pursue a more 
detailed study of the issues presented. The topics presented for discussion consist of (a) 
distinguishing between statistical analysis issues and research design issues, (b) examining various 
concerns with the use of structural equation modeling, (c) considering both statistical and 
practical significance of results, (d) reflecting on when and how to control for inflated Type I 
error rates, and (e) distinguishing between research questions that require multivariate analyses 
and those that require univariate analyses. This presentation may provide researchers with more 
in-depth understanding of how a given technique may accurately address one’s research question. 
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The Importance of Matching the Analytical Technique with the Research Question: 

A Discussion of Current Research Issues 

The discussion of the statistical and research issues presented in this paper is focused on 
the underlying proposition that the analytical technique used by the researcher should not lead to a 
Type VI error. That is, the statistical technique should not be inconsistent with the research 
question (Newman, I., Deitchman, R., Burkholder, J., Sanders, R., & Ervin, L. 1976). The topics 
presented for discussion consist of the following: (a) distinguishing between statistical analysis 
issues and research design issues, (b) examining of various concerns with the use of structural 
equation modeling, (c) considering both statistical and practical significance of results, (d) 
reflecting on when and how to control for inflated Type I error rates, and (e) distinguishing 
between research questions that require multivariate analyses and those that require univariate 
analyses. 

The tremendous analytical power of today’s computers makes it relatively easy for 
instructors to apply data management and analysis. Moreover, the point-and-click environment of 
software almost entices one to mechanically use statistics. While such ease of use lends efficiency 
to expert researchers’ work it may only encourage thoughtlessness and lack of understanding 
among novices (both instructors and their students) who fail to grasp the complete meaning of 
analyses they undertake. We believe if one understands how issues of research design, statistics, 
and univariate and multivariate analyses, practical significance and statistical significance are 
interrelated, instructors can select and use statistical techniques mindfully rather than mindlessly. 
As a result, they are then empowered to teach the concepts more effectively, and thereby decrease 
the likelihood of making Type VI errors in their own research and the research of their students. 
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Issues Addressed by Statistical Analysis Versus Issues Addressed by Research Designs 
Issues addressed by the design of a study are distinct from issues addressed by the 
statistical analyses. Although this distinction may seem obvious, it is vastly overlooked by too 
many researchers. In 1980 the United States Supreme Court based decisions in discrimination 
suits on statistical analyses that found significant relationships (i.e., correlation values) from 
studies that lacked sufficient research design controls of the types discussed by Fisher (1971), 
Newman and Newman, (1994); and Campbell and Stanley, (1963). Such legal decisions 
incorrectly characterized sophisticated statistical analyses as cause-effect relationships. 

All tests of significance are tests of relationships. Therefore, each one of those tests can be 
mathematically translated into the degree of relationship between the variables. Conversely, if one 
knows the sample size, all mathematical relationships can be converted into tests of significance. 

As noted by McNeil, Newman, and Kelley (1996), a t test is equivalent to a point-biserial 
correlation; an F test is equivalent to an eta value or an R value; and a 2X2 chi square value is 
equivalent to a phi coefficient (see Table 1). The point biserial correlation, phi, eta, t, R, F and 
chi square values are all related. Any test of significance merely gives a probability estimate that a 
relationship is not due to random variation. It yields no information about the causal nature of that 
relationship (i.e., the degree of internal validity). Only to the extent that a research design has 

total internal validity can one assume causality. 

Concerns Related to the Use of Structural Equation Modeling 
Structural Equation Modeling (SEM) has many advantages in that it forces the researcher 
to develop or come from a theoretical frame, which frequently improves the quality of the 
research, tends to be more replicable, and more effectively adds to the body of knowledge. That 
is, hypotheses are logically deduced from theory. If a logical, expected theoretically-driven 
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relationship is found, these results are considered to be a stronger indicator of cause than if one 
found a relationship without theoretical or logical support. This strength has another side 
however, one that warrants caution in drawing conclusions too quickly. That is SEM may show 
very different models that imply different causal relationships with equally good estimates of 
goodness-of-fit index values. This is possible because SEM really tests how well the data fit the 
models, not necessarily theories, and models are not necessarily theoretically based. 

With all its strengths and mathematical sophistication, studies that employ SEM may suffer 
from Type VI errors, that is, inconsistency between the research question and the statistical 

analysis. Type VI errors can be caused by a number of problems. 

Lack of interaction terms in SEM. If the theory driving the study assumes interaction 
effects, an inconsistency between the analysis and the theory would exist if the researcher does 
not include interaction effects in the model. Evident in the published literature are too many 
studies by authors whose SEM analyses ignore the interaction effects in their theories. In 
addition, a researcher who is committed to using SEM may be less inclined to propose research 
questions that involve interaction effects. 

Researchers who use SEM often deal with interaction effects by analyzing separate models 
for variables that interact. This simple and elegant solution might work in situations with when 
few variables that interact, if it is a first order interaction and the variables are dichotomous in 
nature rather than ordinal or interval. The interaction problem becomes considerably more 
complicated in situations with multiple interactions; the variables are ordinal or interval in nature; 
and higher order (i.e., second, third, etc.) interaction exists. These situations would result m Type 
VI errors when SEM is used. No mathematical model, of which we are aware, that can 

adequately assess such research questions (Newman & Marth, 1995). 
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Information not provided by goodness-of-fit indexes. These goodness of fit measures 
(e.g., CFI, NFI, NNFI, IFI, RFI, PNFI, PCFI, chi square value, GFI, RMSR, RMSEA, chi square 
divided by degrees of freedom, and AGFI .) All of these measures, except for chi square value, 
are not attached to tests of significance, and frequently give different results. All of these 
goodness-of-fit measures, however, are based on a similar concept. That is, each of these 
measures assesses the model’s ability to reproduce the correlation matrix. 

This method of assessing model fit can lead to an incorrect assessment under two possible 
scenarios. First, it is possible for a model that contains numerous paths to produce a goodness- 
- of-fit value that indicates a good level of fit in spite of the fact that none of the path values are 

statistically significant. Second, it is possible for the model to effectively reproduce the 
correlation matrix even when some of the relationships are in the opposite direction from the 
direction predicted by the theory. 

We believe it is logical to consider the number of path values that are statistically 
significant in the predicted direction. This method of assessing goodness of fit is not based on the 
model’s ability to reproduce the correlation matrix, but rather on its ability to reflect the path 
values as suggested by theory. This approach has been referred to as the Binomial Index of 
Model Fit (Fraas & Newman, 1994; Newman, Fraas, & Norfolk, 1995). 

SEM and confirmatory factor analysis. Most SEM is used to determine if the factors are 
good estimates of the theoretical factor structure. It is very possible that two factors fit the 
structure well and one does not. For such a case, the traditional goodness-of-fit techniques may 
simply indicate a poor fit without revealing which factors do not fit the structure. The Kaiser 
Factor Matching technique, however, would indicate which of the two factors fit well (Newman, 
Dimitrov, & Waechter, 2000). According to these authors, Kaiser Factor Matching should be 
O 
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used along with the more traditional estimates to get a better picture of the relationships of the 
factors to the underlying theoretical constructs. Using more than one method of assessing the 
data is likely to give a more accurate picture and provide a better understanding of the data. 

SEM and measurement error. SEM purports to control for measurement error, a 
safeguard too hastily assumed by researchers who do not appreciate complexities of the issue of 
error. An over simplification of this procedure is that instead of using the indicator variables, 

SEM uses the underlying factors as predictor variables. The assumption is that the factors are 
stable and, therefore, they are controlling for measurement error. While probably reducing 
measurement error, it does not eliminate it. The factors, however, may cause other types of 
errors because they may be quite sample specific. Thus, while almost never implemented, cross 

validation of the model to estimate its stability is warranted. 

Incorporating practical significance levels. We believe it is important for researchers to 
formulate research questions that incorporate practical significance levels (e.g., effect sizes). 
Researchers who use SEM in connection with such research questions must develop goodness-of- 
fit measures that are capable of assessing and modified questions such as, “the difference between 
means are not 0 but are 5 units,” or “the relationship is not 0, but is .3.” 

Importance of Statistical Significance and Practical Significance 
The importance of effect size is a frequently discussed topic in today’s research literature 
(Kirk, 1996; Levin, 1996; Levin & Robinson, 2000; Robinson & Levin, 1997; Thompson, 1989b, 
1996, 1997, 1998, 1999a, 1999b, 1999c). Two important issues are currently being discussed. 
One issue relates to the question: How large should an effect size be in order for it to be 
practically significant? A second issue relates to the question: Should a researcher be concerned 
with practical significance, statistical significance, or both? 

ERIC 
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Establishing the practical significance level. When dealing with the issue of how large 
should an effect size be in order for it to be considered practically significant, Newman and 
Newman (2000) presented the case that even small effect sizes, as measured by small R-squared 
values, may be important, that is, practically significant. They concluded that small R-squared 
values may be very valuable and useful depending on the research question being asked and the 
size of the population. This position, which is supported by Rosenthal and Rosnow (1991) and 
Deming (1982), suggests that practical significance should be measured in a relative sense rather 

than an absolute manner. 

Incorporating the practical significance level into the hypotheses. Fraas and Newman 
(2000, 2001) and Robinson and Levin (1997) argued that consideration of either effect sizes or 
tests of significance alone are insufficient. One needs to consider both. Fraas and Newman 
(2000) discussed the testing of non-nil null hypotheses (i.e., hypotheses that incorporate non-zero 
values). The formulation of such hypotheses allows one to accurately reflect a research question 
that incorporates a practical significant value, thus avoiding a Type VI error. 

To illustrate the use of a non-nil null hypothesis, consider a study designed to test the 
effectiveness of two treatments intended to produce weight losses for cardiac patients. 

Researchers initially need to establish how large the difference the weight-loss means of the two 
groups must be in order for it to be considered important, that is, practically significant. Fraas 
and Newman (2000) expressed the view that this process is probably best undertaken by involving 
practitioners and researchers in the field. 

For the purpose of this illustration, practitioners and researchers decided any difference 
between the group means greater than 10 pounds would be considered practically significant. For 
this practical significance level, the non-nil null hypothesis that Mean 1 minus Mean 2 equals 10 is 
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more clinically meaningful than the nil null hypothesis that Mean 1 minus Mean 2 equals zero. 

That is, instead of saying we are 95% confident the difference is greater than zero, we can say we 
are 95% confident the difference is greater than 10. If experts determined that a difference of 10 
pounds was minimall y required for clinical benefit, the latter statement is far more meaningful and 

useful. 

The use of non-nil null hypotheses requires researchers and practitioners to identify what 
is clinically or pragmatically meaningful before initiating the research. The decisions determining 
the appropriate effect size need to be related to the purpose for doing the research. Newman, 
Ridenour, Newman, and DeMarco (in press) emphasize the importance that researchers frame 
their research question so it is consistent with their purpose. This clear understanding of purpose 
(i.e., who the outcome is intended to inform) is necessary for the researcher to identify the 
appropriate effect size. Without looking at the research question in the context of its purpose, 
one cannot identify an appropriate and useful clinical effect size. The incorporation of the effect 
size into the research question dictates that statistical significance is not considered independent of 
the practical significance level. 

Testing non-nil null hypotheses. Once the appropriate clinical effect size is identified, 

Fraas and Newman (2000) suggest one technique that could be used to test a non-nil null 
hypothesis is a randomization test. The use of a randomization test, which generates its own 
distributions, is used to determine if the observed difference adjusted for the practical significance 
level is unlikely to be due to chance variation. It should be noted, however, that Newman and 
Fraas (2001) also conducted a Monte Carlo study regarding the impact of including the practical 
significance level into the research question on the Type I error rates of independent t tests of two 
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group means. The results revealed negligible impacts of such tests on the Type I error rates. 

Thus, non-nil null hypotheses may, under certain conditions, be tested with parametric tests. 

Statistical significance versus replication. A concept often overlooked when dealing with 
effect sizes and tests of significance is the replication of the findings. Because a study is 
statistically significant does not mean it is replicable (i.e., tests of significance do not provide 
information regarding replication). Whenever possible, the ability to replicate one’s findings 
should be provided through the use of cross validation techniques applied (a) to the original 
sample and/or a new sample, (b) by different researchers, and (c) under different conditions. A 
significant chasm exists in the literature regarding the topic of replication. Compared to the 
growing number of articles and presentations on the importance of tests of significance versus 
effect size, one finds scant attention to the importance of replication. We make a strong call for 
more emphasis to be on the reporting of replication estimates. 

Controlling Type I Error Rates 

Numerous authors discuss the need to control for inflated Type I error rates in studies that 
involve multiple statistical tests. Some express a major concern in the mechanical manner in which 
the various methods of controlling for Type I errors is applied and taught (Kirk, 1982, Newman, 
Croom, Mugrage, & Hoedt, 1983; Newman & Fry, 1972; Newman, Fraas, & Laux, 2000; 
Stevens, 1996; Toothaker, 1991). Newman, Fraas, et al. suggest a non-mechanical approach that 
requires the researcher reflect on three elements of the adjustment process. First, the researcher 
needs to identify the error rate unit or units (i.e., the various groupings of tests for which 
adjustments are to be made). Second, adjustments are not made for tests that are directional and 
based on theory or previous results. Newman, Fraas, et al. suggest that if a researcher adjust the 
alpha levels for tests predicted from previous experience or theory, the researcher will probably 
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overcorrect for possible inflated Type I errors. Such an outcome would increase the likelihood of 
Type II errors being committed. Third, the alpha levels for the non-directional tests in a given 

error rate unit are adjusted for the number of such tests. 

When asking relatively sophisticated students if they controlled for inflated Type I error 
rates due to multiple tests of significance, advanced research students we have worked with 
generally indicate they have. If we further ask them which method they used, why they chose that 
method, and what the error unit is, we seldom get an answer that indicates a thoughtful approach. 
The procedures they used tended to be selected and applied mechanically. They have difficulty in 
justifying the procedure used. The approach outlined by Newman, Fraas, et al. (2000) may 
encourage students to reflect on the rationale of the adjustment process rather than mechanically 

implementing a given technique. 

Multivariate Tests Versus Univariate Tests 
According to the authors of many statistics texts, when one has multiple dependent 
variables, multivariate analysis should be employed. The common reason for using multivariate 
analyses in such cases is that these analyses control for inflated Type I error rates. Unfortunately, 
this recommendation is frequently misunderstood since the overall multivariate test of significance 
is only one aspect of the multivariate testing procedure. When a significant multivariate test is 
followed by univariate tests to determine where the significant differences lie, a control for 
multiple tests is necessary to keep the Type I error rate from being inflated (Croom, 1986, 
Newman, Croom, Mugrage, & Hoedt, 1983). The most important consideration in determining 
whether one should use multivariate or univariate measures should be how well either univanate 
or multivariate analyses reflect the question of interest (Newman & Benz, 1987). If the question 
of interest is not related to the underlying construct of a set of independent variables, but is 
O 
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instead related to each dependent variable, univariate analyses should be used. Multivariate 
analyses should only be used when one is interested in the underlying hypothetical construct of the 
composite set of dependent variables, which is very rarely the case (Jeremy Finn, October, 2001, 
personal communication). 

An additional point should be made regarding the use of multivariate analyses. It appears 
to be a well accepted fact that all multivariate analysis of variance techniques are subsets of 
canonical analyses, just as traditional analysis of variance techniques are subsets of multiple 
regression analyses. (Note that canonical analysis is sometimes called multivariate regression.) In 
a canonical correlation the statistical analysis determines how well the underlying construct of a 
set of predictor variables predicts the underlying constructs of a set of criterion variables. It does 
not, however, answer the question: How well do these individual predictor variables predict these 
individual dependent variables simultaneously? If one is interested in that question and conducts a 
multivariate analysis, then a Type VI error is being made (i.e., the inconsistency in the research 
question and the statistical analyses). 

Summary 

Researchers must realize that today’s computer statistical software allows them to easily 
provide extremely complicated statistical analyses of large and complicated databases. They must 
be co gnizant , however, of a potential danger that such software generates. That is, researchers 
may allow the sophisticated analytical techniques contained in the computer software, such as 
multivariate techniques, to dictate how their analyses will be conducted. Such mechanical 
thinking may lead to an inconsistency between the research question and statistical analysis, that 
is, a Type VI error is committed. Not only are we advocates for thoughtful conceptualizing of 
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one’s research question, we also strongly believe the type of data analysis used is inexorably 
dictated by that question. 

The quality of research will be enhanced if practicing and future educational researchers 
along with the consumers of that research: (a) understand a study s degree of internal validity is 
addressed by its research design and not by its statistical analyses; (b) reflect on the advantages 
and disadvantages afforded researchers who use structural equation modeling as their analytic 
technique; (c) realize the need to consider both statistical and practical significance; (d) 
understand the rationale used for adjusting or not adjusting the alpha levels of multiple statistical 
tests, and (e) comprehend the differences between research questions that require the application 
of multivariate tests and those that should be analyzed with univariate tests. The issues addressed 
in this paper may cause researchers, graduate students, and consumers of research to reflective on 
the need to match one’s analytical methods with the research question. Such reflection may be 
one step on the path that leads to higher quality research in education. 
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Table 1 

All Tests of Significance are Tests of Relationships 



Test 


Measure of Relationship 


t test 


2 r pb 2 df 
rpb ;thatist - 2 

1- r pb 


Ztest 


r pb (P 0 ^ biserial) 


Ftest 


r\ 2 (eta); could also be measured by R 2 


X 2 test 


<p which is = <p 2 (phi coefficient) 



when df = 1 or greater than 1 (contingency coefficient) 
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stringent for documents that cannot be made available through EDRS.) 
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